Jupyter Magic¶
IPython magic is a feature natively integrated into jupyter wherein functions can be turned into simple commands. For example, setting matplotlib inline mode by adding a cell with:
%matplotlib inline
remotemanager
also has support for magic, providing %%sanzu
. A cell headed with this magic will have its contents run on a remote machine via a Dataset. Firstly, we need to load the extension into this notebook:
[1]:
%load_ext remotemanager
Usage¶
At present this is mostly limited to doing simple tasks on a machine, though you can return results if you follow one simple rule. We’ll cover that later, for now lets start with a basic task, creating a folder:
[2]:
from remotemanager import URL
connection = URL()
dirname = 'test'
Basic args set, now we can create our cell, which runs when we execute it.
The syntax is as follows: Always start with %%sanzu
, as this calls the function that does the work. Then, you can follow up with arguments for your Dataset. Note that we don’t need to specify function=...
here, as the function is the cell.
If you need arguments for the cell (such as dirname
here), prefix with %%sargs
, and then continue with anything you need for the cell.
It may look strange to have to specify the local and remote dirs for this runner, though remember that this is still running a Dataset behind the scenes. Everything you can specify there also works here.
Note
If you are familiar with scheduler jobscripts, you can consider %%sanzu
as something like a pragma such as #SBATCH
or #PBS
.
[3]:
%%sanzu url = connection
%%sanzu local_dir = "temp_local"
%%sanzu remote_dir = "temp_remote"
%%sargs dirname = dirname
%%sargs dirname2 = "testme"
import os
os.mkdir(dirname)
os.mkdir(dirname2)
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
Now lets see if our directory was created:
[4]:
'test' in connection.cmd('ls temp_remote').stdout
[4]:
True
Run Behaviour¶
Much like a standard Dataset run, skip
is enabled by default. This means that a cell will only execute once, storing the result. This saves on resource usage, however can be undesirable in some situations. Lets demonstrate the skipping first:
[5]:
%%sanzu url = connection
7 * 7
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
[5]:
49
[6]:
%%sanzu url = connection
7 * 7
runner runner-0 already exists
Staging Dataset... No Runners staged
No Transfer required
Fetching results
No Transfer Required
[6]:
49
Note the cell output warns us that the run was skipped.
We can disable this much like a normal Dataset, by setting skip=False
, or force=True
:
[7]:
%%sanzu url = connection
%%sanzu skip = False
7 * 7
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
[7]:
49
We can see a “test” dir in there, which means it has. Everything else you see is functional files for the Runner that was created. Including a results file, which means…
Accessing results¶
By default the jupyter cell will return the result as though you had run it normally. But what if we want to use this in a later cell?
This is doable via accessing the dataset after the run. The function inserts a magic_dataset
attribute into the jupyter runtime which can be accessed later on.
[8]:
%%sanzu url=connection
"this string has come from the cell!"
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
[8]:
'this string has come from the cell!'
[9]:
print(magic_dataset.results)
['this string has come from the cell!']
Note that we follow the convention of Jupyter in that the last line is returned only if it is at the top level of indentation.
[10]:
%%sanzu url=connection, local_dir="temp_local", remote_dir="temp_remote"
if True:
"nothing comes back"
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
[11]:
print(magic_dataset.results)
[None]
The use case for magic is if you need to run some failrly quick jobs on a remote machine. It could be a run on the front end which grabs some data from a calculation. It could be that the remote machine has access to programs you didn’t install locally. The syntax of the magic is aimed at making it as clear as possible to the readers of your notebook what science you are doing, while somewhat transpararently giving you access to powerful machines.
Pulling extra results¶
Aside from the base sanzu
(which enables the tool), and sargs
(which enables arg passthrough), there is a third option: spull
.
This option will flag objects within a cell for “pulling”, inserting it into the general notebook stream.
[12]:
%%sanzu url = connection
%%spull output
output = []
for i in range(5):
output.append(i)
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
[12]:
{'output': [0, 1, 2, 3, 4]}
This also skips the requirement to access the magic_dataset
:
[13]:
output
[13]:
[0, 1, 2, 3, 4]
You are also not limited to a single output variable, and can have as many targets as you need. Ensuring that the targets are available is also not something that’s needed, as they will simply return None
:
[14]:
%%sanzu url = connection
%%spull output
%%spull val
%%spull foo
val = 10
output = []
for i in range(val):
output.append(i)
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 2 Files... Done
[14]:
{'output': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'val': 10, 'foo': None}
[15]:
print(output)
print(val)
print(foo)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
10
None
Errors¶
The goal of sanzu is to run a cell on a remote machine as though it is running locally. Ideally it should be transparent as though only the executor of the cell is changed. For this, any errors that are raised on the remote are emitted as a RuntimeError
on the local side.
[16]:
%%sanzu url = connection
prin("test")
appended run runner-0
Staging Dataset... Staged 1/1 Runners
Transferring for 1/1 Runners
Transferring 5 Files... Done
Remotely executing 1/1 Runners
Fetching results
Transferring 1 File... Done
/home/test/remotemanager/remotemanager/decorators/magic.py:94: UserWarning: Sanzu encountered an exception, see below, or access magic_dataset.errors
warnings.warn(
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[16], line 1
----> 1 get_ipython().run_cell_magic('sanzu', 'url = connection', '\nprin("test")\n')
File ~/envs/py312/lib/python3.12/site-packages/IPython/core/interactiveshell.py:2541, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
2539 with self.builtin_trap:
2540 args = (magic_arg_s, cell)
-> 2541 result = fn(*args, **kwargs)
2543 # The code below prevents the output from being displayed
2544 # when using magics with decorator @output_can_be_silenced
2545 # when the last Python token in the expression is a ';'.
2546 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):
File ~/remotemanager/remotemanager/decorators/magic.py:98, in RCell.sanzu(self, line, cell, local_ns)
93 if ds.runners[0].is_failed:
94 warnings.warn(
95 "Sanzu encountered an exception, see below, "
96 "or access magic_dataset.errors"
97 )
---> 98 raise RuntimeError(ds.runners[0].error)
100 for name in spull:
101 logger.debug("looking for pull target %s", name)
RuntimeError: NameError: name 'prin' is not defined. Did you mean: 'print'?